67 research outputs found

    Adjacency maps and efficient graph algorithms

    Get PDF
    Graph algorithms that test adjacencies are usually implemented with an adjacency-matrix representation because the adjacency test takes constant time with adjacency matrices, but it takes linear time in the degree of the vertices with adjacency lists. In this article, we review the adjacency-map representation, which supports adjacency tests in constant expected time, and we show that graph algorithms run faster with adjacency maps than with adjacency lists by a small constant factor if they do not test adjacencies and by one or two orders of magnitude if they perform adjacency tests.This research was partially supported by the Spanish Ministry of Science, Innovation and Universities and the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) through grant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version

    The landscape of virus-host protein–protein interaction databases

    Get PDF
    Knowledge of virus-host interactomes has advanced exponentially in the last decade by the use of high-throughput screening technologies to obtain a more comprehensive landscape of virus-host protein–protein interactions. In this article, we present a systematic review of the available virus-host protein–protein interaction database resources. The resources covered in this review are both generic virus-host protein–protein interaction databases and databases of protein–protein interactions for a specific virus or for those viruses that infect a particular host. The databases are reviewed on the basis of the specificity for a particular virus or host, the number of virus-host protein–protein interactions included, and the functionality in terms of browse, search, visualization, and download. Further, we also analyze the overlap of the databases, that is, the number of virus-host protein–protein interactions shared by the various databases, as well as the structure of the virus-host protein–protein interaction network, across viruses and hosts.This research was partially supported by the Spanish Ministry of Science and Innovation, and the European Regional Development Fund, through project PID2021-126114NB-C44 (FEDER/MICINN/AEI).Peer ReviewedPostprint (published version

    Redundancy and subsumption in high-level replacement systems

    Get PDF
    System verification in the broadest sense deals with those semantic properties that can be decided or deduced by analyzing a syntactical description of the system. Hence, one may consider the notions of redundancy and subsumption in this context as they are known from the area of rule-based systems. A rule is redundant if it can be removed without affecting the semantics of the system; it is subsumed by another rule if each application of the former one can be replaced by an application of the latter one with the same effect. In this paper, redundancy and subsumption are carried over from rule-based systems to high-level replacement systems, which in turn generalize graph and hypergraph grammars. The main results presented in this paper are a characterization of subsumption and a sufficient condition for redundancy, which involves composite productions.Postprint (published version

    Unbiased taxonomic annotation of metagenomic samples

    Get PDF
    The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then, classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this paper, we show that the Rand index is a better indicator of classification error than the often used area under the ROC curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time.Peer ReviewedPostprint (author's final draft

    The generalized Robinson-Foulds distance for phylogenetic trees

    Get PDF
    The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.This research was partially supported by the Spanish Ministry of Science, Innovation and Universitiesand the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) throughgrant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version

    Taxonomic assignment in metagenomics with TANGO

    Get PDF
    One of the main computational challenges facing metagenomic analysis is the taxonomic identification of short DNA fragments. The combination of sequence alignment methods with taxonomic assignment based on consensus can provide an accurate estimate of the microbial diversity in a sample. In this note, we show how recent improvements to these consensus methods, as implemented in the latest release of the TANGO tool, can provide an improved estimate of diversity in simulated datasets.Peer ReviewedPostprint (published version

    A balance index for phylogenetic trees based on rooted quartets

    Get PDF
    We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford’s α-model and Aldous’ β-model and on arbitrary trees under the α– γ-model.Peer ReviewedPostprint (author's final draft

    BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS

    Get PDF
    Background: Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects.; Results: BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data).; Conclusion: BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.Peer ReviewedPostprint (published version

    Constrained tree inclusion

    No full text
    The tree matching problem is considered of given labeled trees P and T, determining if the pattern tree P can be obtained from the text tree T by deleting degree-one and degree-two nodes and, in the case of unordered trees, by also permuting siblings. The constrained tree inclusion problem is more sensitive to the structure of the pattern tree than the general tree inclusion problem. Further, it can be solved in polynomial time for both unordered and ordered trees. Algorithms based on the subtree homeomorphism algorithm of (Chung, 1987) are presented that solve the constrained tree inclusion problem in O(m1.5n) time on unordered trees with m and n nodes, and in O(mn) time on ordered trees, using O(mn) additional space. These algorithms can be improved using results of (Shamir and Tsur, 1999) to run in O((m1.5/ logm)n) and O((m/logm)n) time, respectively.Postprint (published version

    Grammatica: an implementation of algebraic graph transformation on Mathematica

    No full text
    Grammatica is a prototype implementation of algebraic graph transformation based on relation algebra. It has been implemented using Mathematica on top of the Combinatorica package, and runs therefore on most platforms. It consists of Mathematica routines for representing, manipulating, displaying and transforming graphs, as well as routines implementing some relation algebra-theoretic operations on graphs. It supports both interactive and automatic application of double-pushout graph productions, being therefore both a teaching aid and a research tool for algebraic graph transformation.Postprint (published version
    • …
    corecore